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DETAILED ACTION 

Specification 

1 . The disclosure is objected to because of the following informalities: 
On page 6, line 18, "heading" should be -hearing — . 
Appropriate correction is required. 

Claim Objections 

2. Claims 3 and 1 0 objected to because of the following informalities: 

There should be a period at the end of these claims. Appropriate correction is 
required. 

Claim Rejections - 35 USC § 102 

3. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 1 02 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(a) the invention was known or used by others in this country, or patented or described in a printed 
publication in this or a foreign country, before the invention thereof by the applicant for a patent. 

4. Claims 1, 2, 4 to 7, 11, 12, and 18 are rejected under 35 U.S.C. 102(a) as being 
anticipated by French-St George et al. (V30). 
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Regarding independent claim 1 , French-St George etal. (V30) discloses a 
method for management of speech and audio prompts in multimodal interfaces, 
comprising: 

"prompting a user of the device using a combination of a visual prompt and an 
audible prompt, including" - a multi-modal user interface provides a telecommunications 
system and methods to facilitate multiple modes of interface with users using voice, 
hard keys, touch sensitive soft key input, and pen input; the system provides for voice or 
key input of data, and for graphical and speech data output (column 5, lines 53 to 67: 
Figure 3); according to a first embodiment, a multimodal user interface includes a 
speech interface for speech input and output and for accessing a speech recognizer, 
and non-speech interfaces for tactile input and graphical output (column 6, lines 15 to 
23: Figure 3); firstly the system or device prompts the user for input, and receives input 
from the user by one of the number of possible interface modalities (column 7, lines 56 
to 59: Figure 4); 

"presenting a set of input choices" - one example is Yahoo™ Yellow Pages™, 
which is a multi-layer application, where prompts are selected associated with that 
layer; for example, the layers of the Yellow Pages directory are: identify service, identify 
city/state, identify business category, and display business name; the background layer 
will communicate a method of returning to selection of other services, a list of services 
available, and that the user may select a service by touching or speaking, or, if 
appropriate, the service name (column 7, line 64 to column 8, line 6; column 8, lines 34 
to 37: Figures 3 and 5); 
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"accepting a command from the user to provide an audible prompt" - if the 
speech recognizer is off, the system determines whether a "touch-to-talk" option is 
selected, i.e. receiving input from a touch sensitive button on the display; if yes, the 
speech interface is turned back on, into foreground state and the system issues a 
speech prompt for input; this option provides the user with one way of turning the 
speech prompts back on (column 9, lines 43 to 52: Figure 7); 

"in response to said command, playing an audible prompt that identifies one or 
more of the set of input choices" - if yes, the speech interface is turned back on, into 
foreground state and the system issues a speech prompt for input; this option provides 
the user with one way of turning the speech prompts back on (column 9, lines 43 to 52: 
Figure 7); as indicated in the flow chart, for example, if the speech recognizer is on, 
corresponding speech prompts will be played (column 8, lines 59 to 51: Figure 6); 

"accepting an input from the user in response to the visual and audible prompts" 
- if a command word is received, the system executes the command, or otherwise 
determines if there is a match to city/state to allow a move to the next layer, when there 
is a further prompt for input (column 10, lines 15 to 19); on selection of the business 
sub-category, a user will select a business sub-category from those displayed by touch, 
or speak a selection from the business sub-category within the time recognition window 
(column 8, line 65 to column 9, line 1 1 : Figure 6). 
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Regarding independent claim 18, French-St George etal. ('030) further 
discloses a method for management of speech and audio prompts in multimodal 
interfaces, comprising: 

"a user interface including, a graphical display, a manual input device, an audio 
output device, and an audio input device" - a multi-modal user interface provides a 
telecommunications system and method to facilitate multiple modes of interface with 
users using voice, hard keys, touch sensitive soft key input, and pen input; the system 
provides for voice or key input of data, and for graphical and speech data output 
(column 5, lines 53 to 67: Figure 3); mobile telephone unit 100 comprises a body 200 
which carries a display screen 210 for the graphical user interface ("a graphical 
display"), conventional keypad 220 and other hard keys 222 ("a manual input device"), 
speaker 240 associated with the speech interface for providing speech prompts ("an 
audio output device"), and a speech recognizer to accept and interpret speech input 
("an audio input device")(column 6, lines 1 to 12: Figure 2); 

"a controller coupled to the user interface configured to" - the device comprises 
means for dynamically switching between a background state of the speech interface 
and a foreground state of the speech interface in accordance with a user's input 
modality choice (column 6, lines 15 to 23). 

Regarding claim 2, French-St George et a/. (V30) discloses the background 
layer will communicate a method of returning to selection of other services, a list of 
services available, and that the user may select a service by touching or speaking, or, if 
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appropriate, the service name (column 7, line 64 to column 8, line 6; column 8, lines 34 
to 37: Figures 3 and 5); a list of services, or a list of recent locations, "includes 
graphically presenting a list of the set of choices." 

Regarding claim 4, French-St George etal. (V30) discloses if the speech 
recognizer is off, the system determines whether a "touch-to-talk" option is selected, i.e. 
receiving input from a touch sensitive button on the display; if yes, the speech interface 
is turned back on, into foreground state and the system issues a speech prompt for 
input; this option provides the user with one way of turning the speech prompts back on 
(column 9, lines 43 to 52: Figure 7); receiving an input from a touch sensitive button on 
the display to activate a "touch-to-talk" option is equivalent to "accepting the command 
from the user to provide an audible prompt includes accepting a manual command." 

Regarding claim 5, French-St. George etal: ('030) discloses an embodiment 
where the speech recognizer is brought into the foreground state when only a speech 
command is received (Figure 7: "Valid Speech Token?": Yes), so that T-spoken tokens 
are returned (Figure 7: "Return tokens: T-written, T-spoken, T-touched"). 

Regarding claims 6 and 7, French-St George et al. (V30) discloses that, as 
indicated in the flow chart, for example, if the speech recognizer is on, corresponding 
speech prompts will be played (column 8, lines 59 to 51 : Figure 6); for example if the 
service layer is selected the display will show "Service selection layer" and play an 
audio speech prompt associated with the service selection layer (column 8, lines 40 to 
42: Figure 3); an audio speech prompt associated with the service selection layer is "an 
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audible representation of the one or more of the choices" and "a spoken description of 
the choices." 

Regarding claims 11 and 12, French-St George etal. ('030) discloses, according 
to a first embodiment, a multimodal user interface including a speech interface for 
speech input and output and for accessing a speech recognizer, and non-speech 
interfaces for tactile input and graphical output (column 6, lines 15 to 23; Figure 3); on 
selection of the business sub-category layer, a user will select a business sub-category 
from those displayed by touch, or speak a selection from the business sub-category 
(column 8, line 66 to column 9, line 2); thus, either manual input or spoken input is 
accepted in response to a set of choices in any layer. 

Claim Rejections - 35 USC § 103 

5. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 

obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

6. Claims 3 and 8 to 10 rejected under 35 U.S.C. 103(a) as being unpatentable over 
French-St George et al. (V30) in view of Gazdzinski. 

French-St George et ai (V30) does not expressly disclose that the speech 
prompts provide a list of the set of choices, that a speech synthesis algorithm produces 
the spoken description, that the representation of choices includes accessing a stored 
audio representation of the spoken description, or that the spoken description is 
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received from a remote location. However, all of these features are common for 
interactive voice response systems. 

Gazdzinski teaches a similar "smart" elevator system and method, where a user 
is prompted on an interactive building directory having a speech recognition system and 
other input devices such as a touch pad to determine if the user desires to select the 
floor of a firm that they are trying to locate. (Column 3, Lines 1 to 18) Specifically, the 
user initiates the "Building Directory" function by pressing a function key, whereupon a 
signal is generated which prompts the system with an audible and/or visual query to the 
user, depending upon how the system is pre-configured. For an audible query, the sub- 
system retrieves a pre-stored CELP (or other compressed format) data file from one of 
storage devices 108, 110 ("accessing a stored audio representation of the spoken 
description") and converts that file to an analog audio representation of voice via the 
speech synthesis module 112 ("speech synthesis algorithm"). (Column 8, Lines 29 to 
49: Figure 4) Also, a central server 170 is located remotely from the elevator, and 
various components may be disposed in many different arrangements within the 
system, inter alia, on the server. (Column 7, Line 56 to Column 8, Line 19: Figures 1 
and 3) Thus, it is suggested that the pre-stored CELP data file may be stored on the 
server ("receiving data characterizing the spoken description from a remote location"). 
Finally, the "Building Directory" function provides prompts audibly via the speech 
synthesis module 1 12 or visually via the display 113. For multiple matching entries, the 
audible prompts are produced in a sequential, predetermined order. For example, the 
first matching entry (alphabetically) would be synthesized, followed by the second entry, 
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etc., until the user states "Stop", to choose the entry desired. (Column 9, Lines 2 to 16) 
Thus, there is disclosed an embodiment for "presenting the set of input choices includes 
audibly presenting a list of the set of choices." 

The objective is to provide information to a person riding in an elevator and to 
determine the location of a person, firm, or store within a building as a convenient 
alternative to building directories posted in the lobby of the building. (Column 1, Lines 
20 to 51 ) It would have been obvious to one having ordinary skill in the art to provide 
the techniques of speech synthesis of stored audio representations from a remote 
location as a spoken list of a set of choices as taught by Gazdzinski in the multi-modal 
interface of French-St. George et al. ('030) for the purpose of providing a convenient 
alternative to a building directory. 

7. Claims 13 to 17 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
French-St George et al. ('030) in view of Kamei et al. 

French-St George etal. ('030) discloses a method of sorting through inputs 
when valid tokens are returned for more than one mode, i.e. touch input, pen input, and 
speech input. Specifically, French-St George etal. ('030) generally prefers touch input 
over speech input when both are received, so spoken input is rejected, unless the 
speech recognizer is on. (Figure 7) Arguably, "the environment" can be read to include 
what sorts of input are received in the context of whether the speech recognizer is in the 
foreground or in the background. However, French-St George et al. ('030) does not 
disclose or suggest an environment including a motor vehicle, wherein the speed of the 
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vehicle is monitored, and manual input is rejected when the speed of the vehicle 
exceeds a threshold speed. However, Kamei et al. teaches an automatic dial telephone 
usable in a motor vehicle by speech recognition, where a safety confirmation procedure 
determines whether or not a speed detected by a speed detector is smaller than a 
predetermined reference speed. (Column 8, Line 51 to Column 9, Line 7: Figure 4: 
Step 43) The safety confirmation procedure is employed in a variety of contexts, so that 
automatic dialing is performed only under safe driving conditions. (Column 1 , Lines 61 
to 64) It would have been obvious to one having ordinary skill in the art to inhibit 
manual or spoken input in a motor vehicle in an environment where the speed of the 
vehicle exceeds a threshold speed as suggested by Kamei et al. in the multi-modal 
interface of French-St George et al. (V30) for the purpose of performing automatic 
dialing only under safe driving conditions. 



Conclusion 

8. The prior art made of record and not relied upon is considered pertinent to 
Applicants 1 disclosure. 

French-St. George et al. (71 1), Smith et al., King, Walters et al., Ito, Everhart et 
al., Holzman et al., Wilson, and Douglas disclose related art. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Martin Lerner whose telephone number is (703) 308- 
9064. The examiner can normally be reached on 8:30 AM to 6:00 PM Monday to 
Thursday. 
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If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Richemond Dorvil can be reached on (703) 305-9645. The fax phone 
number for the organization where this application or proceeding is assigned is 703- 
872-9306. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). 
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Martin Lerner 
Examiner 
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